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Abstract 

Consider an optimization problem with n binary variables and d+1 linear objective functions. 
Each valid solution x G {0, l}' 1 gives rise to an objective vector in R d+1 , and one often wants 
to enumerate the Pareto optima among them. In the worst case there may be exponentially 
many Pareto optima; however, it was recently shown that in (a generalization of) the smoothed 
analysis framework, the expected number is polynomial in n. Unfortunately, the bound obtained 
had a rather bad dependence on d; roughly n d . In this paper we show a significantly improved 
bound of n 2d . 

Our proof is based on analyzing two algorithms. The first algorithm, on input a Pareto op- 
timal x, outputs a "testimony" containing clues about x's objective vector, x's coordinates, and 
the region of space B in which x's objective vector lies. The second algorithm can be regarded 
as a speculative execution of the first — it can uniquely reconstruct x from the testimony's 
clues and just some of the probability space's outcomes. The remainder of the probability 
space's outcomes are just enough to bound the probability that x's objective vector falls into 
the region B. 
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1 Introduction 



We study the expected number of Pareto optimal solutions in multiobjective binary optimization 
problems within the framework of smoothed analysis. 

1.1 Multiobjective optimization and Pareto optima 

In a typical decision-making problem there are multiple criteria used in judging the quality of a 
solution. For example, in choosing a driving route between two points one might want to minimize 
distance, tolls, number of turns, and expected traffic; in choosing a vacation hotel one might want 
to minimize price and distance to the beach, while maximizing quality. In such cases there is rarely 
a single solution which is best on all criteria simultaneously. The most popular way to handle 
the tradeoff is to determine the set of all Pareto optimal solutions, meaning those solutions which 
are not dominated in all measures of quality by some other solution. This idea, originating in 
microeconomics, has been very extensively studied in computer science, especially in operations 
research [Ehr05], algorithmic theory [PY02], artificial intelligence [DcbOl], and database theory 
(under the name "skyline queries") [BKS01]. 

Even if one is not interested in Pareto optima per se, many algorithms and heuristics for solving 
optimization problems enumerate Pareto optimal solutions as an intermediary step. For example, 
the Nemhauser-Ullmann algorithm [NU69] for exactly solving the Knapsack problem works by 
iteratively computing the Pareto optimal (value, weight) pairs achievable by the first i items, for 
i = l...n. Beier and Vocking [BV04] observed that this algorithm runs in time 0(nB), where 
B is an upper bound on the number of Pareto optima in each stage. As we describe below, this 
allowed them to give the first polynomial-time algorithm for an NP-hard optimization problem 
in the smoothed analysis framework, by performing smoothed analysis on the number of Pareto 
optimal solutions. 

Unfortunately, even in the simplest case multiobjective optimization — two linear objective 
functions — the number of Pareto optimal solutions may be exponentially large in the number 
of decision variables. There have been two main approaches to dealing with this exponential 
complexity. The first, popularized by Papadimitriou and Yannakakis [PY02], involves comput- 
ing "e-approximate Pareto sets" . In many cases, polynomial-size e-approximate Pareto sets can be 
computed efficiently; see the thesis of Diakonikolas [DialO] for references. The second approach, pio- 
neered by Beier and Vocking [BV04], involves studying multiobjective optimization in the smoothed 
analysis framework. 

1.2 Smoothed analysis for discrete optimization 

Smoothed analysis was introduced in an influential work of Spielman and Teng [ST04], as a frame- 
work intermediate between worst-case and average-case analysis. Here the idea is to think of real 
numbers in the input as being defined based on imprecise measurements; specifically, they are first 
fixed adversarially in [—1, 1], say, and then subjected to Gaussian noise with some small standard 
deviation a. In this framework, Spielman and Teng showed that a certain version of the simplex 
algorithm for linear programming runs in poly(n, l/<r) expected time. 

A notable work of Beier and Vocking [B V04] from 2003 showed that the NP-hard / 1-Knapsack 
problem can be solved in polynomial time in the smoothed analysis framework. (Previously, there 
had been a long line of work on average-case analysis of 0/1-Knapsack: see, e.g., [DF89, GMS84, 
Lue98].) Furthermore, they showed this holds even in a much more general model of smoothed 
analysis. In one version of their model, each item's profit Pi and weight Wi is an independent 
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random variable with values in [0, 1]; the only restriction is that the probability density function 
(pdf ) of each P{ and Wi is upper-bounded by the parameter (b. We call this model "cb-semirandom" . 
It is easy to see that as cb is increased, the framework goes from (a very general version of) average- 
case analysis to worst-case analysis. For example, given a small number a, if we take § = 1/a then 
the profits Pj could be of the form pi + where pi € [a, 1 — a] is an adversarially chosen number 
and Ui is uniformly random on [— cr, a]. (The original case of Gaussian noise does not quite fit in 
this framework, but is easily handled with a small amount of additional work.) 

1.3 Previous work 

Beier and Vocking showed that in this cb-semirandom model, the expected number of Pareto optimal 
knapsacks is 0(cbn 4 ); from this they immediately deduced that the Nemhauser-Ullmann algorithm 
runs in expected 0(<Jm 5 ) time. In fact, Beier and Vocking showed that the same is true even if 
the weights are adversarially specified, and only the profits are chosen randomly (independently, 
from <|>-bounded distributions). In this case of adversarially weights, they also showed an Q(n 2 ) 
lower bound for the expected number of Pareto optima, even for uniformly distributed profits (i.e., 

4) = i). 

In his thesis, Beier [Bei04] extended this analysis to general 2-objective binary optimization 
problems. Specifically, he showed that given an arbitrary set of "solutions" S C {0, l} n and arbitrary 
2nd objective values Obj 2 (x) for each x G S, if the 1st objective is linear and cb-semirandom, 
then the expected number of Pareto optima is still 0(cbre 4 ). Later work of Beier, Roglin, and 
Vocking [BRV07] improved this bound to 0(cbn 2 ) (which is tight for constant cb) and also extended 
to the case of integer- valued decision variables. 

These works only handled the case of 2 objectives. Recently, Roglin and Teng [RT09] extended 
the analysis to the case in which there are d + 1 objective functions, d of which are linear and 
cb-semirandom, and one of which is completely arbitrary. Their bound on the expected number of 
Pareto optima is polynomial in n and cb for constant d, and they were also able to polynomially 
bound higher moments. Unfortunately, their result is probably of theoretical interest only, as the 
dependence on d is rather bad. E.g., for d = 3 their upper bound on the expected number of 
Pareto optima is roughly n 97 assuming n > 2 453787938 (and is much worse than n 97 for smaller n). 
In general their bound is roug hly for f(d) = 2 d ~ 1 (d + l)!, once n > exp(exp(ci 2 log ci)). 

Roglin and Teng concluded their work by asking whether the exponent f(d) on n could be reduced 
from ci 0(d) to poly(ci); this was later recognized as an important open problem [TenlO]. Here, we 
resolve this question. 

Very closely related to the research we have just described is a sequence of works [BV06, 
ANRV07, RV07, RT09], starting with Beier and Vocking and culminating with Roglin and Teng, 
showing that binary optimization problems are solvable in expected polynomial time in the smoothed 
analysis framework if and only if they are solvable in randomized pseudopolynomial time in the 
worst case. 

1.4 Our contribution 

In this work we give an affirmative answer to the open problem of Roglin and Teng, reducing their 
bound from roughly n 2 to n 2d . Thus the exponent on n can in fact be made linear in d. In 

particular, we prove that the expected number of Pareto optimal solutions in the model described 
above is at most 2 • (4cbci) d(d+1)/2 ■ n 2d . It is interesting to compare our result with what is known 
about Pareto optima when 2 n points are chosen independently and uniformly in [—1, In this 
scenario, old results [BKST78, Dev80, Buc89] show that the expected number of Pareto optima is 
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@(n) d for each constant d. Our bound is within a square of this quantity, despite the significant 
dependencies in the model. We also note that this square is necessary at least for d = 1, due to the 
£l(n 2 ) lower bound of Beier and Vocking [BV04]. 

Usually, in smoothed analysis we are interested in demonstrating that a certain algorithm runs 
quickly or that a certain approximation algorithm returns a near-optimal solution. In such cases, 
one often defines an event - some property of the data that ensures an algorithm runs quickly or 
an approximation algorithm works well. This is true in the context of previous literature on the 
number of Pareto optimal solutions as well — indeed, the works of [BV06, RT09] are based on 
notions of winner gap and loser gap which can be interpreted as a discrete analogue to condition 
number. 

Our approach turns this around: We give a deterministic algorithm, which on input a Pareto 
optimal x, runs on the data and produces an event - in the form of a "testimony" containing 
clues about x's objective vector, x's coordinates, and the region of space B in which x's objective 
vector lies. Our family of events is rather complicated, but is defined implicitly based on a simple 
algorithm. 

We then give a second algorithm which can be regarded as a speculative execution of the 
first — it can uniquely reconstruct x from the testimony's clues and just some of the probability 
space's outcomes. The remainder of the probability space's outcomes are just enough to bound the 
probability that x's objective vector falls into the region B. So we are able to bound the probability 
that any particular "testimony" is output by the first algorithm, and consequently we are able to 
give an upper bound on the expected number of Pareto optimal solutions. 

2 Our result and approach 

In this section we will describe the problem formally, state our Main Theorem, and then briefly 
describe our approach. The remainder of the paper is devoted to the proof of the Main Theorem. 

2.1 Problem definitions 

Our setting captures the broad class of multiobjective binary optimization problems in which all 
(but one) of the objective functions are linear. We fix once and for all an arbitrary set S C {0, l} n 
of solutions. These might encode knapsacks, the sets of edges forming a spanning tree in a given 
graph, or even the sets of edges forming a Hamiltonian cycle. 

Matrix notation. We think of solutions in S C {0, l} n as column vectors. For a matrix (or 
vector) A, we will write A 1 for the z'th row of A and write Aj for the j'th column of A; thus Aj is 
the entry of A. For i < k we will also write A l " k for the submatrix of A consisting of rows i 
through k. Given matrices A and B of the same size we write A o B for their Hadamard product, 
i.e., their entry-wise product. Thus (Ao B) 1 - = A % -B % -. 

Values and objectives. Associated to each solution x G S are d + 1 objectives; we encode them 
with a column vector Obj(x) G The first d objectives are assumed to be linear and are chosen 

in a "4>-semirandom" fashion. More specifically, there is a d x n matrix V of random variables 
called values. (We typically write random variables in boldface.) We assume that each entry of 
V is an independent, continuous random variable with support on [—1, 1] and pdf bounded by the 
parameter cb. The first d objectives of solution x are defined by the equation Obj 1 " d {x) = Vx. 
(Recall that x G {0, l} n is thought of as a column vector.) The (d+ l)'th objectives of the solutions 



3 



are neither linear nor random. We assume merely that they are fixed, distinct real numbers, 
chosen in advance of V. (Indeed, their magnitudes are not important for us, only their relative 
ordering.) We will significantly abuse notation by writing V + x in place of Ob] d+1 {x). In this 
way, Obf(x) = V l x holds for each i £ [d + 1]. 



Pareto optima. Without loss of generality, we think of higher objectives as preferable. Accord- 
ingly, given (column) vectors p,q £ ]R rf+1 we say that p dominates q if p > q. Here ">" is to be 
interpreted entry-wise when applied to vectors; i.e., p dominates q if p l > q l for all i £ [d + 1]. We 
will also sometimes use the notion of t-domination for t < d + 1; we say that p t-dominates q if 
pi..t > g 1 -*. Given a set of points CP C M. d+1 we say that p £ CP is Pareto optimal (within 7) if p 
is not dominated by any other point q £ CP; i.e., for all q £ CP \ {p}, there exists i £ [d + 1] with 
p l > q l '. Of course, we will be interested in applying this concept to the objectives of the solutions 
in S. Given V, we consider CP = {Obj(z) : z £ §} C [— n, n] d x M. We then say that the solution 
x £ S is Pareto optimal if Obj(x) is Pareto optimal within CP. Finally, given V, we define PO C S 
to be the set of all Pareto optimal solutions. 



2.2 Our result 

We can now state our Main Theorem: 



Main Theorem. 



E 

V L 



PO 



< 2 • (4<|>d) 



d(d+l)/2 , n 2ci 



2.3 Our approach 

To prove the Main Theorem we use a probabilistic argument which has a rather unusual form. 
Unfortunately, it is also fairly intricate. In this section we will try to convey some of the ideas of 
the argument while hiding a number of complicating details. 

Our proof can be seen as a d-dimensional generalization of the Beier-Roglin-Vocking 0(cbn 2 ) 
upper bound for the d = 1 case (which we will later sketch). The reader is advised to keep 
the cases d = 1,2 in mind for visualization purposes. Recall that the solutions x S S have d 
semirandom linear objectives but their (d + l)'th objectives are fixed in advance arbitrarily. Once 
the values V are drawn and the objectives Obj 1 " rf (x) £ [— n, n] d thus determined, one can think 
of identifying the Pareto optima among § via a "sweep" along the (d + l)'th dimension. This 
means proceeding through the solutions x £ S in decreasing order of Obj d+1 (x) and considering the 
"point" Obi 1 " d (x) = Vx £ [—n,n] d ; the set of points which are not ti-dominated by any previously 
seen point correspond exactly to the set of Pareto optimal solutions. 



Boxes and density. An oversimplification of our proof is to think of it as showing that the 
"probability density" of Pareto optimal points in [—n,n] d is not too high; roughly 0(n d ). In aid 
of making this formal, we fix once and for all a real number e > which should be thought of as 
extremely small, e <^ l/((\)d2 2n ). Additionally, we assume that 1/e is an integer. We then introduce 
the following definition: 

Definition 2.1. For a point b £ (eZ) d , we define the d-box based at point b to be b + [0, e) d . Note 
that the set of all d-boxes partitions [—n,n) d and indeed all of M. d . More generally, for t £ [d] and 
b £ (eZ) J , we define the t-box based at point b to be (b + [0, e) d ) x The set of all t-boxes also 

partitions M. d . 
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Since e is so small, the probability that there will be two different points Vx and Vx' in the 
same d-box is negligible. Thus if B denotes an arbitrary <i-box, we can upper-bound the number 
of Pareto optima by (2n/e) d times the probability that there is a Pareto optimum x £ S with 
Oh^" d (x) in B. Our goal is to bound this probability by roughly 0(n d )e d . 

In particular, we must make sure to keep the probability roughly comparable to e d . A crucial 
aspect of our proof is that we can bound Pr[Vx £ B] by (<t>e) d for any i / while only using a 
small part of the probability space. Specifically, suppose we select j £ [n] such that x 3 ^ 0, and 
then imagine drawing all entries of V except for the j'th column Vj. Then the final position of 
the point Vx is of the form (p 1 + Vj, . . . ,p n + V™), where the p ll s are constants. This point will 
lie in the box B only if each value V 1 a falls into a certain fixed interval of width e. As the random 
variables V*- are independent and have pdf's bounded by (J), the probability that all Vys fall into 
the required intervals is at most (cpe) d . Note that this argument works for any possible outcome of 
the d{n — 1) values outside of Vj. 

Uniqueness. Unfortunately we cannot simply take this observation and union-bound over all 
potential Pareto optima x, since this would lose a factor of |S|. We would be in much better shape 
if, after all values except for Vj were drawn, there were very few solutions x — or even just a 
unique solution x — for which the event 

Ta; = "x is Pareto optimal with Vx £ B" 

had a chance of occurring. Here by "have a chance of occurring" , we mean PrVj [T x ] > 0. In 
the simplest case of d = 1, Beier, Roglin, and Vocking [BRV07] essentially show that essentially 
holds if one adds some extra conditions to the event T x . We now sketch a reinterpretation of their 
argument. 

The Beier— Roglin— Vocking argument. Note that since d = 1 for this sketch, the values 
matrix V is just a random (row) vector. For each j £ [n] and 1-box (interval) B, let us define the 
significantly more complicated event 

^x,j,B = "x 3 = 1, Vx £ B, x is Pareto optimal, and the 'next' Pareto optimum y has y 3 = 0". 

Here 'next' refers to the "sweep along the 2nd coordinate"; i.e., y is the solution z with maximal 
Obj 2 (z) among {z £ S : Vz > Vx}. The Beier-Roglin- Vocking argument takes a union bound 
over all j £ [n] in addition to over all B. The key to their argument is the following "uniqueness" 
claim: for any draw of the values other than Vj, there is a unique x £ S for which the event T x ,j,B 
has a chance of occurring. Given this claim, the proof is almost complete. For that unique x the 
event T x ,j,B still has at most a <J)e chance of occurring, since x 3 must be 1 and the value Vj is 
still independent and undrawn. Union-bounding over all j and B, one concludes that the expected 
value of 

^{Pareto optimal x : the 'next' Pareto optimum y has y 3 ^ 1 = x J for some j} 

is at most n- (2n/e) • (ct)e) = 2<Jm 2 . This almost counts the total number of Pareto optima. Certainly 
for each Pareto optimum x, there is some coordinate j such that the 'next' Pareto optimum y has 
y 3 7^ a = x 3 ; it's just that this bit a might be rather than 1. The Beier-Roglin- Vocking is 
concluded (essentially) by union-bounding over a £ {0, 1} as well. (It may seem crucial that x 3 
was 1 and not when we observed that Prv^ [Vx £ B] < ct)e. This difficulty is overcome with an 
additional trick, changing the condition Vx £ B in T x ,j B to the condition Vx — V jd in T x j a B-) 
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The Roglin Teng argument. How can we generalize this argument to d dimensions? Roglin 
and Teng roughly take the following approach. First, they generalize the above argument to show 
that for d = 1, the expected c'th power of the number of Pareto optima is ((pn 2 ) c ^ 1+ °^\ This gives 
them a concentration result for the number of Pareto optima. They then proceed by induction on 
the dimension d. In reducing from dimension d to d — 1 there are two difficulties. First, instead of 
having a unique x to worry about as in the Beier-Roglin-Vocking, they need to worry about all 
solutions in a (d — l)-dimensional Pareto set. One expects this not to be too large, by induction; 
however, their argument needs a high-probability result. Hence they need to inductively bound 
higher powers of the number of Pareto optima. This induction leads to the rather bad dependence 
on d. A second difficulty they face comes from their use of conditioning to separate the cPth 
dimension from the first d—1; this introduces dependencies that they must work to control. 

Our argument. We define a family of events T x j ai>- These events are again of the form "x 
falls into a box related to 13 and certain other lower-dimensional conditions happen" . We need to 
define these other conditions in an extremely careful way so that the following holds: 

Based on J, there is a way to partition the draw of V into two parts called 
M{J)oV and M(J)oV . Given the outcome of M{J)oV , there is a unique 
x S S for which r T x ,J,A,'B> can occur. Furthermore, the randomness remaining 
in M(J) o V is such that the probability of T^ X ,J,A,'R can still be bounded 
by an appropriately small quantity. 

We manage to identify the necessary conditions; however they are complicated enough that 
they cannot be described with just a sentence of text. Instead, we come to the first unusual 
aspect of our argument; the extra conditions are of the form "a certain deterministic algorithm 
Witness, when run with input x and V, produces a certain output testimony" . At this point the 
reader might think that such conditions have no chance to satisfy the boxed property above: in 
particular, since Witness depends on V, how can knowing its output still leave the M(J) o V 
part of the probability space free? We overcome this problem with a second unusual idea. We 
introduce another deterministic algorithm called Reconstruct , which takes as input the output 
Witness(x, V), along with the outcome of M(J) o V. We show that using just this information, 
Reconstruct can recover the input x, assuming that it is Pareto optimal. This ability to reverse- 
engineer x gives us the needed "uniqueness" property, and the fact that Reconstruct does not 
need to know M{ J) o V - and yet this amount of remaining randomness is still enough to bound 
the probability that x falls into certain boxes. 

3 Outline of the proof 

At this point we move from intuition to precise details. In this section we give the overall structure 
of our proof of the Main Theorem. By the end of this outline we will have reduced it to a number 
of lemmas, which are then proven in the appendices of the paper. 

3.1 Testimonies 

The first key ingredient in our proof is a deterministic map we call Witness, which takes as input 
a solution x £ S and a fixed matrix of values V, and outputs a "testimony" (J, A, 23): 

Witness: (x, V) (->• (J,A,"B). 
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(The map Witness also depends on the fixed quantities n, e, S, and the (d + l)'th objectives 
Obj d+1 (z).) We will actually only care about the behavior of Witness(x, V) when the values V 
make x into a Pareto optimum, but it is clearest to define the mapping for every pair of x and V. 

Regarding the testimony itself, roughly speaking J is a list of d coordinates in [n], A is a 
"diagonalization matrix" consisting of d bits per coordinate in J, and 23 is a list of i-boxes, one 
for each t G [d]. Very roughly speaking, the meaning of Witness(x, V) = (J, A, 23) is that the bits 
{x 3 : j G J} agree with certain bits in A and that Vx falls into the boxes in 23 — or rather, that 
a slight translation of Vx based on A falls into these boxes. Precise details are given in Section 4, 
but they are not important for understanding the outline of the proof. On first reading, one should 
think of the number of possible testimonies as something roughly like n 2rf / / e rf ( c( + 1 )/ 2 . 



3.2 The OK event 

We will also need to define a simple event based on the random draw of V which we call OK. In 
studying Pareto optima we prefer not to distinguish between domination and "strict" domination. 
Luckily we don't have to: since the entries of V are continuous random variables, the probability 
that V z x = V l y for any i G [d] and distinct x, y G S is 0. Our event OK, which we now formally 
define, slightly generalizes this: 

Definition 3.1. OK = OK(V) is defined to be the event that | V l x — Vy\ > e for all i G [d] and 
distinct x, y G §. 

We require the following simple lemma: 

OK Lemma. Pr^OK] < cb(f2 2n+1 e. 

Proof: For each fixed i G [d] and distinct x,y G {0, l} n , we show that Pr[|V 4 x — V l y\ < e] < 2cbe; 
the result then follows by a union bound. Since x and y are distinct we may select j G [n] such 
that x 3 — y 3 = 1, after possibly exchanging x and y. Now imagine that the values {V\ : k ^ j} are 
drawn first; then the event | V l x — V l y\ < e becomes of the form |c + Vj\ < e for some constant c. 
By independence, the random variable V*- still has pdf bounded by c(), so this event has probability 
at most ct) • 2e, as desired. ■ 

3.3 Proof of the Main Theorem 

We are now able to outline the proof of the Main Theorem. 



E 

v 



|PO| 



E 

v 



IPOl • lfOKl 



E 

v 



|PO| • l[-.OK] 



(1) 



Regarding the second term in (1), naively we have 



E 

V L 



PO • lhOKl 



< E 
V L 



T ■ lhOK] = 2 n PrhOK] < $>d2 3n+1 e, 



(2) 



using the OK Lemma. As for the first (main) term in (1), we break it up according to the possible 
testimonies: 



E 

v 



PO l[OK] 


^ V 




(J,A,H) 1 



£ l[x G PO] • l[Witness(x, V) = {J,A,"B)] ■ l[OK] 



(3) 
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For a given draw of values V, it is possible to show that if the event OK occurs, then the differ- 
ent x G PO generate unique testimonies (J, A, 23). (This follows from the Testimony-Determines— 
PO Lemma in Section 4.) In other words, for a fixed testimony (J, A, 23), after V is drawn there 
can be at most one x G § for which the event 

(x G PO) A (Witness(x, V) = (J, A, 23)) A OK 

occurred. This shows that (3) is at most the number of possible testimonies. Unfortunately, 
that is not a helpful bound because the number of possible testimonies includes the huge factor 
(l/e) d( - d+l ^ 2 . 

We now come to the key idea in the proof. For each fixed testimony (J, A, 23), we split up the 
draw of V into two stages in a way that depends on J. In the first stage, "most" of the dn entries 
of V are drawn; we denote these entries by M(J) o V for reasons to be explained later. In the 
second stage, the remaining "few" entries of V are drawn (independently, of course); we denote 
this second set of entries by M(J) o V. On first reading, one should think of "few" as meaning 
d(d + l)/2. Now the key idea is that the uniqueness property described above holds even after just 
drawing M{J) o V: 



Uniqueness Lemma. Fix a testimony (J, A, 23) and fix the outcome of M{J) o V. Then there 
exists at most one x G § such that the event 

(x G PO) A (Witness(x, V) = (J, A, 23)) A OK 

can occur. Here the event's randomness is just the draw of M(J) o V . 

Based on this idea, we write (3) as 



{w) mU)oV 



Pr \(x G PO) A (Witness(x, V) = (J, A, 23)) A OK] 

x6 S M(J)oV 



The Uniqueness Lemma says that for each choice of (J, A, 23) and M(J) o V, at most one of the 
probabilities in the above expression can be nonzero. Hence we may upper-bound (3) by 



V E max Pr \(x G PO) A (Witness(x, V) = (J, A, 23)) A OKI 



x£S M(J)oV 



(4) 



We now complete the proof by showing that there is enough randomness left in M( J) o V so 
that for any x G S, even the probability of the event Witness(x, V) = (J, A, 23) is small. We bound 
this probability in terms of a parameter called dim(23), which we define in Section 4 For now, it 
suffices to know that dim(23) is an integer between and d(d + l)/2; on first reading, one should 
think of it as simply always being d(d + l)/2. 



Boundedness Lemma. For every fixed (J, A, 23), outcome of M{J) o V , and x G S, it holds that 

Pr [Witness(x, F) = (J, A 23)1 < ^)^). 

M(J)oV 



Using this in (4) we upper-bound (3) by 



^dim(25) e dim(:B)_ 

As mentioned, on first reading one should think of dim(23) as always being d{d + l)/2 and one 
should think of the number of possible testimonies as being roughly n 2d / € d ( d + 1 )/ 2 . Thus (5) is 
roughly cj) d ( rf + 1 )/ 2 • n 2d , comparable to the quantity in the Main Theorem. We will eventually do a 
more precise but straightforward estimation to bound (5) (and hence (3)): 
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Counting Lemma. For a fixed n and e, 



J2 (j3 dim(S) e dim(B) < 2 . (4d(|)) d(d+l)/2 . n 2d_ 



possible testimonies 
(J,A,S) 



Substituting this bound on (3), as well as the bound (2), into (1) yields 



E 

V L 



iPOl 



< 2 • (4d(j>) 



d(d+l)/2 . „2d 



n za + (\>d2 



3n+l, 



Since we can make e arbitrarily small, the proof of the Main Theorem is complete. 

4 Testimonies 

In this section we describe the Witness algorithm, which assumes n, e, S, and the (d + l)'th 
objectives Obj d+1 (z) are fixed. The input to Witness is a solution x 6 S and a d x n matrix of 
values V. The output is a "testimony", which is a triple (J,A,"B). 



4.1 Components of a testimony 

We now describe the components of a testimony. 

Index vector. We call the first component, J, the "index vector". This is defined to be a length- 
en row vector from ([n] U {-L}) d in which all non-_L indices are distinct. On first reading, one should 
ignore the possibility of _L's and simply think of an index vector J as an ordered list of d distinct 
indices from [n]. 

Diagonalization matrix. We call the second component, A, a "diagonalization matrix". A is 
n x d matrix with entries from {0, 1, _L}. Most entries in A will be 0; indeed, the row A 3 will be 
nonzero only if j is one of the indices in J. Before describing A completely formally, let us describe 
the "typical" case when J contains no _L's, and thus just consists of distinct indices from [n]. In 
this case, A will also contain no _L's. To make the picture even clearer, let us imagine that J is 
simply (1,2,..., d). Thus A will only be nonzero in its first d rows, so let us write A' = A 1 " d . In 
this case, if x € S is the input to Witness, then A 1 will be of the following form: 



* 


* 


* 


* 


X 2 


* 


* 


* 


X 3 


J i 


* 


* 


4 

X 


A 

X 




* 


x d 




x d ■ ■ 


■ ~x d 



and * denotes that the entry may be 
, d) . We now give the formal definition 



Here each x J is of course in {0, 1}, we write x J for 1 — x 3 
either or 1. We say that A diagonalizes x on J = (1, 2, . . 
which includes the possibility of J containing _L's. 

Definition 4.1. Given an index vector J and a solution x £ {0, l} n , we say that the matrix 
A E {0, 1, _|_} nxrf is a diagonalization matrix, and in particular that it diagonalizes x on J, if the 
following conditions hold: If j € [n] does not appear in J, then row A 3 is all zeros. Otherwise, if 
j = J u £ [n] for some «£ [d]: 



_L if and only J t = _L, 



X3, 



A\ = x j for all i < u with J t ^ _L. 



(6) 
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Box list. The last component of a testimony, 23, is a list 23 = (B\, . . . , B^)- For t E [d] we have 
that i?£ = _L if Jt = _L, and otherwise Bt is a t-box, as defined in Section 2.3. We define the 
dimension of the box list 23 to be ^2{t £ [d] : Bt ^ -L}. On first reading, one should ignore the 
possibility of B t = _L, in which case dim(23) is always 1 + 2 + ■ • • + d = d(d + l)/2. 

Masking matrix. Having defined the components (J, A, 23) of a testimony, we now explain one 
more piece of notation; that of a masking matrix. Given an index vector J, we define the associated 
masking matrix M(J) £ {0, l} rfxn as follows: 

MfJY- < ^ if = <^ e f n ] f° r some t ^ M an d i < i, 
J I otherwise. 

For illustration, if J = (1, 2, . . . , d), then M(J) is the mostly-zeros d x n matrix whose left-most 
d x d submatrix is 

"1 1 1 1 ... 1 " 
1 1 1 ... 1 
1 1 ... 1 
1 ... 1 • 

0... 1 _ 

Note that in the "typical" case that J contains no _L's, the number of l's in M(J) is exactly 
d(d + l)/2. Given a masking matrix, we write M(J) for its bitwise complement; i.e., M{J)- = 
1 — M{J) 1 -. We are now able to explain the notation used in the key step of the proof of the Main 
Theorem. Given the semi-random matrix of values V, note that for any J, 

V = Wj) o V + M (J) o V. 

Further, the random matrices M(J) o V and M(J) o V are independent of one another. This gives 
our crucial means of separating the random draw of V into two stages. 

4.2 The Witness mapping 

Here is the deterministic algorithm computing the Witness mapping: 
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Witness(x, V) : 






1. 


Set Jid+i = 


S. 




2. 


Initialize 


J to the length-(f column vector (_L, _L, . . . , _L) . 




3. 


Initialize 


Y to the nx d matrix where every entry is _L. 




4. 


For t = d,d 


- l,d-2,...,l: 




5. 


Let Q t = 






6. 


if e t ^ 






7. 


Set column Yj to be the y £ Cf for which V t+1 y is maximal.^ 


8. 


Set J t 


to be the least index in [n] such that Y t i ^ x" Jt 


i 


9. 


Set 3?t 


= {z€ % +1 : V t+1 z > V t+1 Y t and z Jt = x Jt ] . 




10. 


Else 






11. 


Set % 


= %+l ■ 




12. 


End If 






13. 


End For 






14. 


Define the 


• \Yl if j appears in J, 
n x d matrix A by A J U = < 

10 otherwise. 




15. 


Define the 

For u € [d] , 


Box list ¥> = (Bi, . . . , Bd) as follows: 
if J u = _L then set B u = _L . 






Otherwise , 


set B u to be the n-box containing Vx — (M(J) o 


V)A U . 


16. 


Output (J, A, 23) . 





' Two comments about this line: Regarding maximality, say that we break ties by lexicographic order. 
Regarding the case t = d, recall our abuse of notation: V d+1 y is defined to be Obj d+1 (y). 

* Such an index must exist: Y* x because Y* G G t and therefore V 1 "*Y* > V 1 " t x. 

It is clear that the index vector J and the Box list "B output by Witness have the form we 
claimed. We now verify that Witness(x, V) indeed outputs a proper diagonalization matrix A: 

Proposition 4.2. The matrix A output by Witness(x,y) always diagonalizes x on J. 

Proof: At the end of the algorithm, by definition row A J is all zeros if j does not appear in J. 
Thus it remains to analyze each row A Ju , where u € [d] is such that J u ^ _L. By definition, we 
have Af u = Y t Ju for each t € [d]. Thus we need to verify the three conditions in (6) for Y t Ju . First, 
Y t Ju = _L if and only if Y t was "not defined" during iteration t of the algorithm (i.e., if C< = 0), 
which occurs precisely when J t = _L. Next, Y^ u = xi by definition of J u . Finally, because of line (9) 
in Witness we have that z Ju = x Ju for z € 0l u . Thus for any t < u where Jt 7^ -L (and thus 7^ 0), 
we have Y/ u = x Ju because Y Ju eCjC R t+1 C R u . ■ 

We also record another simple observation: 

Proposition 4.3. Given an execution of Witness(x, V), any two solutions in Jit have the same 
Jt 'th coordinate, the same Jt+i 'th coordinate, . . . , and the same Jd 'th coordinate ( excluding the 
cases t < u < d where J u = -L). 
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Proof: For a fixed t with Jt 7^ _L, the fact that all solutions in Jit have the same Jt 'th coordinate 
follows immediately from the definition of Jit- The claim for coordinates Jt+i, . . . ,Jd follows from 
the fact that % C % +1 C ■ ■ ■ C Jl d . U 

This proposition combines with our definition of masking matrices in a crucial way: 

Masking Lemma. Given an execution of Witness(x, V), for any t G [d] and z, z' G 

F'z > VV O (M(J) o > (M(J) o V)V. 

Proof: We have 

- z') = (M{J)oV)\z - z') + (M(J) o - z'). 

By definition of M(J), the row vector (M(J)oT/)* has nonzero entries only in indices Jt, Jt+i, ■ ■ ■ , Jd 
(excluding those J u 's which are _L). But by Proposition 4.3, z and z' agree on these indices. Hence 
(M(J) o Vf{z - z') = 0, and therefore V\z - z') = (M(J) o V)\z - z'). The lemma follows. ■ 

Finally, our proof of the key Uniqueness Lemma in Section 5 will rely on the following simpler 
uniqueness claim: 

Testimony Determines PO Lemma. Suppose that we run Witness(x, V), where V is an out- 
come for the values such that x is Pareto optimal and such that OK occurs. Then at the end of 
the run, x is uniquely defined by being the z G Jl\ with maximal V 1 z. 

We remark that the assumption that OK(V) occurs is stronger than necessary; we only need 
that V % y 7^ V l y' for all i G [d] and distinct y, y' G S (an event that occurs with probability 1). 
Proof: We make the following two claims about the execution of Witness(x, V): 

Claim 1: For each t G [d + 1] it holds that x is not t-dominated by any zeS;. 

Claim 2: x must be in 

Assuming these claims, the lemma follows immediately: x G 3?i by claim 2, and no z G 3?i has 
V l z > V 1 x by claim 1. 

We begin by proving Claim 1. For t = d + 1, this follows immediately from the definition of 
x being Pareto optimal. For smaller t, let us consider the i'th iteration of "For" loop, in which 
Jit is defined. We need to consider two cases corresponding to the "If" condition. First suppose 
Qt 7^ 0) so lines (7) — (9) are executed. Now if there were some z in the newly defined Jit which 
t-dominated x, then it would satisfy V t+1 z > V t+1 Y t and V 1 " t z > V 1 " t x. Since the OK event 
holds, the latter inequality can be strengthened to V 1 " t z > V 1 " t x. But this means z must be in 
the set Qf Since V t+l z > V t+1 Y t , we have a contradiction with how Y t was chosen in line (7). We 
now consider the second case, that Ct = 0. In this case, Jit — J^t+i- Now by definition of Ct = ; 
there is no z G Jlt+i = J^t which has V 1 " t z > V 1 " t x. Since the OK event occurs, we can strengthen 
this statement to say that no z G Jit can even have V l " l z > V 1 "*^, as needed. 

We now prove Claim 2. Specifically, we show that x G Jit for ah t G [d + 1] by (downward) 
induction on t. The base case, that x G Jlt+i, hold by definition. Assume then that x G 3?t+i f° r 
some t G [d\. Consider now the t'th iteration of the "For" loop. If the "If" condition does not hold 
then Jl t = Jlt+i 3 as needed. Assume then that lines (7) — (9) are executed. To show x G Jit it 
suffices to show that V t+1 x > V t+1 Y t . If this is not true, then V t+1 Y t > V t+1 x, and V l - l Yt > V 1 - t x 
also, since Yt G C^. But that means that Yt G Jlt+i (t + l)-dominates x, contradicting Claim 1. ■ 
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5 The Uniqueness Lemma 



Let us restate the Uniqueness Lemma. 

Uniqueness Lemma. Fix a testimony (J, A, 13) and fix the outcome of M{J) o V. Then there 
exists at most one x G S such that the event 



can occur. Here the event's randomness is just the draw of M(J) o V. 

We prove the Uniqueness Lemma in a roundabout way. Specifically, we introduce a second 
deterministic algorithm called Reconstruct , which takes as input a testimony (J, A, 55) and fixed 
values M(J) o V, and outputs a solution x G S (or possibly 'FAIL'). Instead of the Uniqueness 
Lemma as stated, we prove the following: 

Uniqueness Lemma'. Let solution x G S and ixz/ue matrix V be such that x is Pareto op- 
timal and such that event OK occurs. Assume further that Witness(x, V) = (J, A, 23). T/ien 
Reconstruct ((J, A, 23), (M( J) o U)) outputs x. 

This immediately implies the Uniqueness Lemma, as follows: Fix a testimony (J, A, 13) and an 
outcome M(J) o V = M(J) o V. Suppose there exist solutions x,x' G S for which event (7) can 
occur (with possibly different outcomes for M(J) o V). Then Uniqueness Lemma' tells us that the 
output of Reconstruct ((J, A, 13), (M( J) o V)) is both x and x'\ hence x = x' . 



The remainder of this section is devoted to the proof of Uniqueness Lemma'. We begin by 
defining the algorithm Reconstruct . 



Reconstruct^,/, A 23), (M(J) o V)): 


1. 


Set Old+i = § • 


2. 


Initialize Y_ to the nx d matrix where every entry is _L. 


3. 


For t = d, d — 1, d — 2, . . . , 1 : 


4. 


If Jt^-L, 


5. 


Write 6g (e^) 4 for the base point of 


6. 


Set Q' t = {z G #t+i : (M(J) o U) L -*2 > b and = ^ V indices j G J} . 


7. 


Set y t to be the y £ C' t for which (M(J) oV) t+1 y is maximal.* 


8. 


Set S ( = {ze % +1 : (M( J) o U) t+1 z > (M(J) o U)' +1 y t and z Jt / y/ f } . 


9. 


Else 


10. 


Set % = % +l . 


11. 


End If 


12. 


End For 


13. 


Output the x G 3?i for which (M(J) o U) 1 ^ is maximal. 



* Some comments about this line. First, if u = d then we interpret (M(J) o \/) d+1 y to mean Ob] d+1 {y). Second, 
regarding maximality, we break ties by lexicographic order. Third, for some inputs to Reconstruct it is possible that 
the set Ct is empty; in this case one can think of Reconstruct as halting and outputting 'FAIL'. However we will only 




(7) 
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be analyzing Reconstruct on inputs where this provably never happen. Finally, the first remark here also applies to 
line (8) and the second and third remarks here also apply to line (13). 

To prove Uniqueness Lemma', we fix x and V such that x is Pareto optimal and such that event 
OK occurs. We further suppose we have executed Witness(x,V) producing (J, A, !B), and then 
executed Reconstruct ^ J, A, B), (M(J) o V)) producing x. Our goal is to show that x = x. To 
do this, we will analyze the internal variable assignments made in the executions of Witness and 
Reconstruct . More specifically, the main task will be to show the following claim asserting that 
Jit = 3lt for all £ G [d + 1]. Once we show this, it will be easy to conclude that x = x also. 

Claim 5.1. % = % for all te[d + l]. 

Proof: The proof is by (downward) induction. The base case is clear, as Rd+i = = S- For the 
induction we assume that 3? n +i = for some u G [d]. We now show that Jl u = Ji u . In doing so, 
we will also show that Y u = Y u . The set will not necessarily equal &u, but will be a subset of it. 

We henceforth restrict attention to the the t = u iteration of the "For" loop in the execution of 
Witness and Reconstruct , since this is when variables "Jl u and "Jl u were set. We consider two cases 
depending on whether or not J u = _L. In the easy case that J u = _L, Witness must have assigned 
% a = 3? M+ i, and certainly Reconstruct assigned 3l u = By induction, Ol u+ i = ^- u +i, and 

hence 3l u = Jl u as required. 

The remainder of the claim's proof is devoted to the case that J u ^ _L, in which case Witness 
executed its lines (7) — (9) and Reconstruct executed its lines (5) — (8). The B u referred to in 
Reconstruct s line (5) is defined at the end of Witness to be u-box containing Vx — (M( J) o V)A U . 
By definition, this means the base point b G {eL) u used by Reconstruct is such that 

V 1 - u x-{M{J)oV) 1 - u A u G 6+[0,e) u 
=> V^x G b+ [0,e) u , 

where b = (M( J) o V) 1 - U A U + b. 

Recall that Witness defines C u = {z G "R u +i '■ V 1 " u z > V 1 " u x}. In fact, because we have assumed 
V causes event OK to occur, we may deduce 

e u = {zeK u+ i:V 1 ~ u z>b}. (8) 

For if there were some z G 3t u +i and i G [u] with b l < V l z < b l + e, we would have \ V l z — V l x\ < e, 
contradicting the occurrence of OK. (The reader may note that this deduction is precisely the 
reason we introduced the event OK.) 

Next, recall that Witness defines Y u to be the y G Q u for which V u+1 y is maximal (and this 
maximizer is unique since we assume OK occurs). Since Yu = A u for all indices j appearing in J, 
we must also have that Y u is the maximizer of V u+1 y among all y within the following (nonempty) 
subset of C u : 

e' u := {z G 0l u+1 : V 1 - U z > b and z j = A{ for all indices j G J}. (9) 

Observe that 

V 1 - U z>b ^ (M(J)oV) 1 -- u z + {M(J)oV) l -- u z> {M{J)oV) l - u A u + b. 

Since all z 6 6J, agree with A u in the indices from J, and since M{J) is nonzero only in columns 
whose indices are in J, we have that 

(M(J) o V)z = (M(J) o V)A U for every z G Q' u . (10) 
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Therefore an equivalent definition to (9) is 

Q' u = {z G Jlu+i : (M{J) o V) 1 - u z > b and z j = A{ for all indices j G J}. 

But = "Jlu+i by induction, and hence Q' u = Q' u . 

The remainder of the proof of the claim now follows fairly easily using the Masking Lemma 
from Section 4. Recall that Y u is the maximizer of V u+1 y among all y G C^. On the other hand, 
Reconstruct defines Y u to be the y G Q' u = Q' u with maximal (M(J) o U) u+1 y. We claim that 

1^ = Y u . If u = d then this is immediate, as both V d+1 y and (M(J) o U) d+1 y are interpreted 
as Obj rf+1 (y). If u < d, this follows immediately from the Masking Lemma, using the fact that 

Finally, we wish to show that Jl u = Jl u . Recall that 

R u = {ze -R u+1 : V u+1 z > V U+1 Y U and z Ju = x J "}, 
and %a ={z£ R u+1 : (M{J)oV) u+1 z > (WJ) o V) u+1 Yu and z Ju ^ Yj u } 
= {z e Ru+x : (M{J)oV) u+l z > (M\J)oV) u+1 Y u and z Ju = x J "}; 

in this last deduction we used = (by induction), Y^ = Y u , and Y u u = x Ju (which 

follows from the definition of J u in Witness). If u = d then 3i u = Jl u again follows from the 
interpretation V d+1 z = (M(J) o V) d+l z = Obj d+1 (^). If u < d then = % t again follows from 
the Masking Lemma, noting that z,Y u G This completes the proof of the induction and 

hence the claim. ■ 

Having proven Claim 5.1, it is easy to complete the proof of Uniqueness Lemma', i.e., to show 
x = x. Since the values matrix V is assumed to make x Pareto optimal and make OK occur, 
the Testimony-Determines-PO Lemma from Section 4 implies that x is the solution z G 3?i with 
maximal V 1 z. On the other hand, x is defined to be the solution z G Oli = Oli with maximal 
(Af(J) o V^z. But these maximizers are equal by the Masking Lemma. 

6 The Boundedness Lemma 

In this section we restate and prove the Boundedness Lemma. 

Boundedness Lemma. For every fixed (J, A,¥>), outcome of M(J) o V , and x G S, it holds that 

Pr [Witness(x, V) = (J,A,"B)] < ^(25)^(25) _ 

M{J)oV 

Proof: As in the proof of the Uniqueness Lemma we fix the testimony (J, A, 23) and the outcome 
M{J) o V = M{J) o V . Unlike the proof of that lemma, we also fix x G S. By Proposition 4.2 we 
may assume that matrix A diagonalizes x on J; otherwise the probability of Witness(M( J) o V) = 
(J, A, 23) is 0. 

Write 23 = (B\, . . . , B^), where each Bt is either a t-box or is _L (if Jt = _L). For each t G [d] 
with J t 7^ _L we define the event 

IN t = "Fx - (M(J) o V)A t G Bt", 

where again, the randomness of these events is just the draw of M(J) o V. We may complete the 
proof by showing 

< ct) dim(s) e dim(:B) . (11) 



Pr 

M(J)oV 



A 



IN, 



te[d]:J t ^± 
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Recall that 

1 if j = Jt G [n] for some t G [d] and i < t, 
otherwise. 



M{J)) 



We will imagine drawing the random entries of M ( J) o V in d stages. In the Vih. stage we draw the 
t entries M(J) o V jf, unless Jt = -L in which case we "skip" the i'th stage. By the independence 
of the entries, the following claim immediately implies (11): 

Claim: Assume t G [d] has Jt ^= -1. Suppose we have completed the first t — 1 stages of drawing 
M{J) o V. Then whether the event INj occurs is determined in the t'th stage, and its probability 
is at most cb*e*. 

To prove the claim we we write b G M* for the base point of Bt and observe that 

IN t Vx- (M (J) o V)A t G B t 



^ (M(J) oV) 1 -^ + (M(J) oV) 1 -^ - (M(J) oV^-^At G b+[0,eY 

(M( J) o V) x " t (x — A t ) G (6- (M{J) oV) 1 "^) + [0,e)*. 

(12) 

Recalling the definition of M(J) we see that for a fixed i G [i], 

(M(J)oV7(*-^)= £ (M(J)oF)S>-^) Ju (13) 

i<u<t: J u y£-L 

+ (M(J)oVy Jt (x-A t ) Jt (14) 



+ £ (M^oF)^,-^. (15) 

Please note that in (13) we have written M(J) o V rather than M(J) o V because the entries 
(M( J) o V) j for u < t have been fixed prior to the t'th stage. The entries of M( J) o V appearing 
in (14) and (15), however, are still to be drawn. 

At this point it may seem as though the event lN t as given in (12) depends not only on the 
entries (M(J) o V) 1 ^ 1 as stated in the claim, but also on the entries (M(J) o V) 1 ^ 1 for u > t. 
But this is where we make a crucial observation; indeed, the one which explains why we defined 
Witness to produce diagonalization matrices. By definition of A diagonalizing x on J, 



(x-A t 



±1 ifj = Jt, 

if j = J u G [n] for some u > t. 



(If j = J u G [n] for some u < t then we cannot say anything about (x — At) 3 , but we do not need 
to.) Substituting this into (14) and (15), we deduce that 

(M(J) o V)*(x - A t ) = constant ± (M(J) o V)\. (16) 

In particular, the term (15) has dropped out; hence event (12) does not in fact depend on the 
entries (M(J) o V) l j^ for u > t, as claimed. Finally, substituting (16) into (12) we see that the 
event INt is equivalent to a conjunction of t events of the form 

±(M(J)oVy Jt G [c uCi + e) 

where the q's are fixed constants. Since the random variables (M(J) o V) l j are independent and 
have pdf's bounded by 4>, we conclude that the probability of INt is indeed at most (cbe)*, as 
claimed. ■ 
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7 The Counting Lemma 



Here we restate and prove the Counting Lemma. 
Counting Lemma. For a fixed n and e, the quantity 



( |)dim(S) e dim(S) 



(17) 



possible testimonies 



is at most 2 • (Ad$) d{ - d+1)/2 ■ n 2d . 

Proof: For a given index vector J let us define the following quantities: 

count(J) = #{t : J t / _L}, sum(J) = ^{t : J t / _L}, max(J) = max{t : J t / _L}. 

Observe that for a possible testimony (J, ^4,23), the quantity sum(J) is identical to dim(23). We 
may therefore express (17) as 



^2 4jsum(j) e sum(j) . g t ^ A ^ ^ ig & possible testimony}. 

possible J 



(18) 



Let us now count the pairs (^4, 23) that form possible testimonies with J. By Proposition 4.2 we know 
that A must diagonalize some solution x on J. There are 2 count ^ choices for the values of x-* , for j 
appearing in J. These force some entries of A; the remaining ^2{t— 1 : Jt ^ -L} = sum( J)— count(J) 
entries are free. Thus there are 



-)Count(J) r)Sum( J)— count(J) nsnm(J) 



possible choices for A. 



(19) 



As for 23, let us first count the number of possibilities for -B max (j) (assuming max(J) exists). 
We write m = max( J) for brevity; on first reading, one should think of m as always being d. An 
execution of Witness(x, V) which is consistent with J and A defines B m to be the m-box containing 
the point p = Vx - (M(J) o V)A m . Since the entries of V are bounded in [—1, 1] always and since 
M(J) contains at most d nonzero entries, the point p must lie in [— n — d,n + d) d } There are 
therefore at most (2(n + d)/e) m choices for the box B m . 

We could similarly upper-bound the number of choices for each remaining t-box by {2{n+d) /e)*; 
however, this would lead to a final count whose dependence on d was n d + d ( d + 1 )/ 2 ; rather than n 2d . 
To get the much better dependence of n 2d we observe that once B m is chosen, the remaining t-boxes 
cannot be "too far away" because, like B m , they contain a point close to Vx. More precisely, let 
t < m be such that J t ^ _L and consider B t . It is the i-box containing p = Vx — (M(J) o V)A t . 
Now p — p = (M(J)oV)(A t — A u ), which means that ||p — p||co ^ d. It follows that given the choice 
of B m , there are at most ((2d + l)/e)* choices for Bt- We conclude that the number of possible 
choices for 23 is at most 

(2(n +d) /er^. n « M+1 )/ e )' = (wr n ' 

t<max(J):J t ^± 



< 



2(n + d)y (2d + l 
2d + l 



sum(J) 



1 Proving that p cannot have any coordinate exactly equal to n + d is an exercise for the reader. 
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Combining this with (19) and substituting into (18), we upper-bound (17) by 

£ (2(2d + l)4)) sum(J) (2(n + d)/(2d + l)) d . 

possible J 

Finally, we simply upper-bound sum(J) by d(d + l)/2 and the number of possible J by (n + l) a! . 
We conclude that (17) is at most 

(n + l) d (2(2d + l)cb) d{d+1)/2 (2(n + d)/(2d + l)) d = {A$) d{d+l)/2 {d + l/2) d ^ d ~ 1] l 2 {n + l) d {n + d) d . 

One may check that (d+ l/2)( d_1 )/ 2 (n + \){n + d) < 2 1 ^ d d^ d+1 ^ 2 n 2 for any d > 1 and n > 3 (which 
we may assume, as our final bound is always at least 2 3 ). Hence (17) is indeed at most 

2(4d^ d+1 ^ 2 n 2d , 

as claimed. ■ 

8 Conclusion 

There are several open problems that remain. One intriguing problem is to show a lower bound 
for the expected number of Pareto optima in which the exponent on n grows with d. Currently we 
cannot rule out the possibility of an upper bound of the form f(d,<p)n 2 ; however we regard this 
possibility as unlikely. We feel it is likely that there is a lower bound of at least Q(n d ) for constant 
d and cb; our intuition is partly based on the known lower bound of Q(n d ) in the scenario of 2 n 
completely independent points uniformly distributed on [—1, l] d+1 . 

Another interesting open problem is whether our methods can be used to give improved upper 
bounds on the higher moments of the number of Pareto optima in the smoothed analysis model. 
This is currently unclear; we know of no bounds that improve on those of Roglin and Teng [RT09]. 
Finally, one could ask about reducing the factor of (<pd) d ( d+1 ^ 2 in our bound, as well as whether 
our results extend to the case of solutions in {0, 1,2,..., c} n for integer constants c > 1. 
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